Southeast Sulawesi
Splits! A Flexible Dataset and Evaluation Framework for Sociocultural Linguistic Investigation
Caplan, Eylon, Chakraborty, Tania, Goldwasser, Dan
Variation in language use, shaped by speakers' sociocultural background and specific context of use, offers a rich lens into cultural perspectives, values, and opinions. However, the computational study of these Sociocultural Linguistic Phenomena (SLP) has often been limited to bespoke analyses of specific groups or topics, hindering the pace of scientific discovery. To address this, we introduce Splits!, a 9.7 million-post dataset from Reddit designed for systematic and flexible research. The dataset contains posts from over 53,000 users across 6 demographic groups, organized into 89 discussion topics to enable comparative analysis. We validate Splits! via self-identification and by successfully replicating several known SLPs from existing literature. We complement this dataset with a framework that leverages efficient retrieval methods to rapidly validate potential SLPs (PSLPs) by automatically evaluating whether a given hypothesis is supported by our data. Crucially, to distinguish between novel and obvious insights, the framework incorporates a human-validated measure of a hypothesis's ``unexpectedness.'' We demonstrate that the two-stage process reduces the number of statistically significant findings requiring manual inspection by a factor of 1.5-1.8x, streamlining the discovery of promising phenomena for further investigation.
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > Austria > Vienna (0.14)
- (19 more...)
- Media (1.00)
- Leisure & Entertainment (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Government (0.93)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.34)
SpineWave: Harnessing Fish Rigid-Flexible Spinal Kinematics for Enhancing Biomimetic Robotic Locomotion
He, Qu, Li, Weikun, Dai, Guangmin, Chen, Hao, Liu, Qimeng, Tian, Xiaoqing, You, Jie, Cui, Weicheng, Triantafyllou, Michael S., Fan, Dixia
Fish have endured millions of years of evolution, and their distinct rigid-flexible body structures offer inspiration for overcoming challenges in underwater robotics, such as limited mobility, high energy consumption, and adaptability. This paper introduces SpineWave, a biomimetic robotic fish featuring a fish-spine-like rigid-flexible transition structure. The structure integrates expandable fishbone-like ribs and adjustable magnets, mimicking the stretch and recoil of fish muscles to balance rigidity and flexibility. In addition, we employed an evolutionary algorithm to optimize the hydrodynamics of the robot, achieving significant improvements in swimming performance. Real-world tests demonstrated robustness and potential for environmental monitoring, underwater exploration, and industrial inspection. These tests established SpineWave as a transformative platform for aquatic robotics.
- Europe > United Kingdom > England (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.05)
- (5 more...)
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
Cahyawijaya, Samuel, Lovenia, Holy, Aji, Alham Fikri, Winata, Genta Indra, Wilie, Bryan, Mahendra, Rahmad, Wibisono, Christian, Romadhony, Ade, Vincentio, Karissa, Koto, Fajri, Santoso, Jennifer, Moeljadi, David, Wirawan, Cahya, Hudi, Frederikus, Parmonangan, Ivan Halim, Alfina, Ika, Wicaksono, Muhammad Satrio, Putra, Ilham Firdausi, Rahmadani, Samsul, Oenang, Yulianti, Septiandri, Ali Akbar, Jaya, James, Dhole, Kaustubh D., Suryani, Arie Ardiyanti, Putri, Rifki Afina, Su, Dan, Stevens, Keith, Nityasya, Made Nindyatama, Adilazuarda, Muhammad Farid, Ignatius, Ryan, Diandaru, Ryandito, Yu, Tiezheng, Ghifari, Vito, Dai, Wenliang, Xu, Yan, Damapuspita, Dyah, Tho, Cuk, Karo, Ichwanul Muslim Karo, Fatyanosa, Tirana Noor, Ji, Ziwei, Fung, Pascale, Neubig, Graham, Baldwin, Timothy, Ruder, Sebastian, Sujaini, Herry, Sakti, Sakriani, Purwarianti, Ayu
We present NusaCrowd, a collaborative initiative to collect and unify existing resources for Indonesian languages, including opening access to previously non-public resources. Through this initiative, we have brought together 137 datasets and 118 standardized data loaders. The quality of the datasets has been assessed manually and automatically, and their value is demonstrated through multiple experiments. NusaCrowd's data collection enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and the local languages of Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual automatic speech recognition benchmark in Indonesian and the local languages of Indonesia. Our work strives to advance natural language processing (NLP) research for languages that are under-represented despite being widely spoken.
- North America > United States > Texas > Dallas County > Dallas (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Timor-Leste (0.14)
- (64 more...)
- Law (0.67)
- Government (0.67)
- Information Technology > Services (0.67)
- (3 more...)
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Winata, Genta Indra, Aji, Alham Fikri, Cahyawijaya, Samuel, Mahendra, Rahmad, Koto, Fajri, Romadhony, Ade, Kurniawan, Kemal, Moeljadi, David, Prasojo, Radityo Eko, Fung, Pascale, Baldwin, Timothy, Lau, Jey Han, Sennrich, Rico, Ruder, Sebastian
Natural language processing (NLP) has a significant impact on society via technologies such as machine translation and search engines. Despite its success, NLP technology is only widely available for high-resource languages such as English and Chinese, while it remains inaccessible to many languages due to the unavailability of data resources and benchmarks. In this work, we focus on developing resources for languages in Indonesia. Despite being the second most linguistically diverse country, most languages in Indonesia are categorized as endangered and some are even extinct. We develop the first-ever parallel resource for 10 low-resource languages in Indonesia. Our resource includes datasets, a multi-task benchmark, and lexicons, as well as a parallel Indonesian-English dataset. We provide extensive analyses and describe the challenges when creating such resources. We hope that our work can spark NLP research on Indonesian and other underrepresented languages.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (30 more...)
Relation Detection for Indonesian Language using Deep Neural Network -- Support Vector Machine
Hasudungan, Ramos Janoah, Purwarianti, Ayu
Relation Detection is a task to determine whether two entities are related or not. In this paper, we employ neural network to do relation detection between two named entities for Indonesian Language. We used feature such as word embedding, position embedding, POS-Tag embedding, and character embedding. For the model, we divide the model into two parts: Front-part classifier (Convolutional layer or LSTM layer) and Back-part classifier (Dense layer or SVM). We did grid search method of neural network hyper parameter and SVM. We used 6000 Indonesian sentences for training process and 1,125 for testing. The best result is 0.8083 on F1-Score using Convolutional Layer as front-part and SVM as back-part.
- Asia > Indonesia > Java > Jakarta > Jakarta (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada (0.04)
- (2 more...)